quantization problem
Clustering with Bregman Divergences: an Asymptotic Analysis
Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the limiting distribution of the centers as k, extending well-known results for k-means clustering.
Clustering with Bregman Divergences: an Asymptotic Analysis
Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the limiting distribution of the centers as k, extending well-known results for k-means clustering.
Soft Quantization using Entropic Regularization
Lakshmanan, Rajmadan, Pichler, Alois
The quantization problem aims to find the best possible approximation of probability measures on ${\mathbb{R}}^d$ using finite, discrete measures. The Wasserstein distance is a typical choice to measure the quality of the approximation. This contribution investigates the properties and robustness of the entropy-regularized quantization problem, which relaxes the standard quantization problem. The proposed approximation technique naturally adopts the softmin function, which is well known for its robustness in terms of theoretical and practicability standpoints. Moreover, we use the entropy-regularized Wasserstein distance to evaluate the quality of the soft quantization problem's approximation, and we implement a stochastic gradient approach to achieve the optimal solutions. The control parameter in our proposed method allows for the adjustment of the optimization problem's difficulty level, providing significant advantages when dealing with exceptionally challenging problems of interest. As well, this contribution empirically illustrates the performance of the method in various expositions.
Non-asymptotic convergence bounds for Wasserstein approximation using point clouds
Merigot, Quentin, Santambrogio, Filippo, Sarrazin, Clรฉment
Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyd's algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.
Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers
Soflaei, Masoumeh, Guo, Hongyu, Al-Bashabsheh, Ali, Mao, Yongyi, Zhang, Richong
We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a "vector quantization" approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, "Aggregated Learning", for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks. Introduction The revival of neural networks in the paradigm of deep learning (LeCun, Bengio, and Hinton 2015) has stimulated intense interest in understanding the networking of deep neural networks, e.g., (Shwartz-Ziv and Tishby 2017; Zhang et al. 2017). Among various efforts, an information-theoretic approach, information bottleneck (IB) (Tishby, Pereira, and Bialek 1999) stands out as a fundamental tool to theorize the learning of deep neural networks (Shwartz-Ziv and Tishby 2017; Saxe et al. 2018; Dai et al. 2018). Under the IB principle, the core of learning a neural network classifier is to find a representation T of the input example X, that contains as much as possible the information about X and as little as possible the information about the label Y .
Aggregated Learning: A Vector Quantization Approach to Learning with Neural Networks
Guo, Hongyu, Mao, Yongyi, Zhang, Richong
We establish an equivalence between information bottleneck (IB) learning and an unconventional quantization problem, "IB quantization". Under this equivalence, standard neural network models correspond to scalar IB quantizers. We prove a coding theorem for IB quantization, which implies that scalar IB quantizers are in general inferior to vector IB quantizers. This inspires us to develop a learning framework for neural networks, AgrLearn, that corresponds to vector IB quantizers. We experimentally verify that AgrLearn applied to some deep network models of current art improves upon them, while requiring less training data. With a heuristic smoothing, AgrLearn further improves its performance, resulting in new state of the art in image classification on Cifar10.
Clustering with Bregman Divergences: an Asymptotic Analysis
Clustering, in particular $k$-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of $k$-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters $k$ is large. We establish quantization rates and describe the limiting distribution of the centers as $k\to \infty$, extending well-known results for $k$-means clustering.